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Abstract We introduce Bayesian Estimation Applied to Multiple Species (BEAMS), 
an algorithm designed to deal with parameter estimation when using contaminated 
data. We present the algorithm and demonstrate how it works with the help of a 
Gaussian simulation. We then apply it to supernova data from the Sloan Digital Sky 
Survey (SDSS), showing how the resulting confidence contours of the cosmological 
parameters shrink significantly. 



1 Introduction 



As demonstrated by the 201 1 Nobel prize in physics, supemovae of type la (SN-Ia) 
are one of the most important tools to study the expansion history of the universe 
|[T1|2). Supernovae are exploding stars that can be seen at extremely large distances. 
The most distant supernova currently known (designated SN 19941, a type Iln su- 
pernova) is at a distance of 1 1 billion light years |3 1. Due to the finite speed of light, 
a signal from a very distant source takes a very long time reach us, in this case some 
11 billion years (the universe itself is about 13.7 billion years old H). During the 
time that the light of the explosion travels towards us, the universe expands and the 
wavelength of the light is redshifted. By looking at spectral lines, it is possible to 
measure by how much the frequency changed. In the case of SN 19941 the redshift 
is z = 2.36, i.e. the wavelength of the light was stretched by a factor of 2.36. This 
also implies that the universe has grown by a factor of (1 + z) =3.36 over the last 
11 billion years. 

In this way it is possible to map the expansion history of the universe and to 
compare it to predictions from General Relativity (GR). But in GR, space-time is 
curved, which makes the definition of distances somewhat tricky. One way to do this 
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assumes that we know the intrinsic luminosity of an object. Such an object is called 
a standard candle, and astrophysical research has shown that SN-Ia are fairly good 
standard candles after some data processing. If we have a standard candle, then the 
observed brightness can be linked directly to a distance (called luminosity distance 
di): the more distant the standard candle, the dimmer we see it. Astronomers tend to 
use not the distance but its logarithm /i ^ logio di, for historical reasons (supposedly 
because the eye uses a roughly logarithmic scale to gauge brightness, apparently 
the magnitude system dates back to Greek astronomers). The magnitude-redshift 
diagram, a plot of /i versus z, can then be used to study the way the universe has 
behaved over most of its lifetime. 

Unfortunately not all supernovae are standard candles. There are two main 
classes that are due to very different mechanisms. Stellar explosions of type la are 
thought to occur when a small dead star (a white dwarf) exists in a binary sys- 
tem with a large, red star and begins to accrete material from that larger star At 
some point this cosmic cannibal begins to over-eat, and its stellar structure becomes 
unstable. At this point, reminiscent somewhat of the unforgettable dinner scene in 
the Monty Python sketch, the white dwarf explodes in a gigantic conflagration. Al- 
though the exact mechanism is not yet fully understood, it is plausible that such 
events always produce supernovae of comparable luminosity, as the instability al- 
ways occurs under about the same conditions. 

The second class is due to supermassive stars running out of fuel: during most 
of the life of gigantic stars, the gravitational force of their own mass that wants to 
crush them is balanced by the radiation pressure generated by burning hydrogen, 
and later heavier elements. This is possible because it is energetically favorable to 
fuse light elements together, but once the fusion process reaches lead, the energetics 
reverse and for heavier elements it is actually favorable to split (this is why fusion 
plants would use light elements while fission plants use very heavy elements). So 
stars will eventually run out of fuel, and thus will lose the radiation pressure. If the 
star is heavy enough then the matter itself also cannot support the self-gravity of the 
star, and it will collapse under its own weight. The result is called a core-collapse 
supernova, and much of the gravitational energy liberated in the collapse is radiated 
away in neutrinos and photons. Such supernovae occur unfortunately with a wide 
variety of intrinsic luminosities and so are unsuitable for distance measurements. 

Luckily the different types can be distinguished with the help of spectral analysis 
of the supernova light. But measuring a spectrum requires much more light and 
effort than simply measuring the brightness of an object, as we need to split the 
light into the different wavelengths. While taking spectra has been feasible for the 
hundreds supernovae that have been observed to date, we are now seeing a transition 
where large surveys are finding thousands of supernovae (e.g. the supernova data of 
the Sloan Digital Sky Survey, SDSS-II SN |5 , 6 1), which ai-e too many to take all the 
spectra. Future astronomical projects like the large synoptic survey telescope (LSST 
|[7 1) will find tens of thousands to hundreds of thousands of supernovae per year. It 
is impossible to follow up more than a tiny fraction of this data with spectroscopic 
observations. The "normal" observations will still provide light-curves in several 
different color bands, of the kind shown in Fig. [T] Such observations will yield 
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some idea what kind of supernova we are looking at, but they cannot provide the 
near-certainty of spectra. We are then faced with a stark choice: either we throw 
away over 99% of the data, or we develop a statistical method that is robust against 
mis-identification of supemovae. Here we will make an attempt at providing such 
a method. The material presented here is based on several publications ||8] |9] (TO) 
where more details can be found (see also limfT2l '). 



Fig. 1 Fitting liglitcurves 
to supernova data: A simu- 
lated set of SN-la light curves 
in different bands from the 
Supernova Photoinetric Clas- 
sification Challenge 1 13l . 
together with interpolating 
curves from 1 14|. 
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In the following section, we introduce the BEAMS formalism and discuss in 
more detail the role of the probabilities. We then present our choice of likelihood 
functions for the different types of supernovae in section [3j where we also provide 
some tests of the algorithm itself. In section|4]we apply the algorithm to the SDSS-II 
supernova data. In the final section we summarize the chapter, providing conclusions 
and an outlook to future work. 



2 Basic BEAMS 

2.1 The BEAMS formalism 

Let us first introduce the mathematical formalism (see also 1 8 1 for simple examples 
and basic tests). We normally want to know the posterior distribution P{Q\D) for 
parameters 9 given data D. Now assume that there is an additional dependence on 
the type of population that the data has been drawn from. For simplicity, let us 
assume that there are two kinds of data, type A (corresponding for example to type 
la supernovae) and type B (all other kinds of supernovae). Introducing a type vector 
T of the same length as the data vector D and with entries T, = A or T, = B, we 
can then write 

P(0|D)=£P(0,T|D) (1) 

T 
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where we marginalized over all 2 possible T vectors. It is obviously straightfor- 
ward to generalize this to an arbitrary number of different populations. The joint 
probability density P{9,t\D) for a given vector T is probably difficult to determine 
directly, so we use Bayes theorem to rewrite it, 

Pie,T\D) = PiD\e,T)^j^. (2) 

The "evidence" factor P{D) is independent of both the parameters and T and is an 
overall normalization that can be dropped for parameter estimation. We will further 
assume here that P{9,t) w P{9)P{t). This simplification assumes that the actual 
parameters describing our universe are not significantly correlated with the proba- 
bility of a given supernova to be of type la or of some other type. Although it is 
possible that there is some influence, we can safely neglect it for current data as 
our parameters are describing the large-scale evolution of the universe, while the 
type of supernova should mainly depend on local gastrophysics. In this case P{9) 
is the usual prior parameter probability. We will also assume P{t) to separate into 
independent factors, 

n^'IKl-^;)' (3) 

T,=A Ti=B 

for a discussion of this approximation please see ||9l. Here the product over "t, = A" 
should be interpreted as a product over those indices / in the vector T for which 
Xi= A. In other words, given a population vector T with entries "A" for SN-Ia and 
"fi" for other types, the total probability P(x) is the product over all entries, with 
a factor Pj if the j-th entry is "A" and 1 —Pj otherwise (if the j-th entry is "B"). In 
this way Pj is always a probability to find an entry Xj = A in the vector T before 
using the data. Notice that we discuss here only one given vector T, the uncertainty 
is taken care of by the outer sum over all possible such vectors. The full expression 
is therefore 

P{Q\D)^P{Q)Y^P(p\Q,x)WP,X\{\-Pj). (4) 

T T,=A Tj=B 

The factor t) is the likelihood, but now conditional on the data types. This 

means when we write down later on an expression for the likelihood, we can do it 
assuming that the type of each data point is known. 

The price to pay is that we then have to marginalize over all possible vectors T, 
evaluating a sum composed of 2^^ terms for data points. The exponential scaling 
with the number of data points means that we can in general not evaluate the full 
posterior directly, but have to use a clever approximation. Here we wiU instead make 
an additional assumption that the data points are not correlated, 

P{p\d,x)=X\P{D,\Q,x) (5) 

1=1 

= WP(p,\Q,Xi^A)WP{Dj\Q,x,^B). (6) 

T,=A T^=B 
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In the second line we have made the reasonable assumption that the probability of 
observation / does not depend on the assumed type of the object j ^ i. We have also 
indicated that the likelihood of each observation naturally splits into two popula- 
tions, those which have entry A in T, and those with entry B. In general the form 
of these two likelihood classes will be different. In toy model applications, we wiU 
usually know how they look like, but for actual data they may be unknown and we 
will have to leave some additional freedom. 

The form of Eqs. Q and (j6| allows for a huge computational simplification: the 
posterior is the sum over all possible products of the type A1A2A3 . . ., B1A2A3 . . ., 
AiB2^3 ■ ■ etc. This sum of 2^ terms can be generated simply by computing the 
product over terms Ylk{^k+ Bi^), and the posterior Eq. Q can be written as 

N 

Pid\D)ocP(e)Yl{P{Di\9,Ti =A)Pi+P{Di\d,Xi = B){l -Pi)}. (7) 

!=1 

This is the form of the BEAMS posterior that we will be using for the rest of this 
chapter. 



2.2 BEAMS probabilities 

The probabilities in BEAMS are of central importance, so it is worth to take a closer 
look by studying a simple toy model: we assume that we are dealing with two popu- 
lations (let us call them 'red' and 'blue') drawn from two normal distributions with 
means at ±0 and equal variances of — \, see the top panel of Figure [2] In ad- 
dition to the basic formalism discussed in section |2T| above, we will introduce an 
extra parameter A that can adjust the relative normalization of the probabilities. As 
we will see, this parameter allows for automatically adjusting for an unknown rel- 
ative rate of the two populations. We introduce the parameter as a change of the 
relative probability B (the Bayes factor) to be of the red or blue kind: 

P ~ P P 

B= >B = BA=A = ^. (8) 

\-P \-P \-P 



The effective, adjusted probability is then 

AP 



1 -P+AP 



(9) 



and we will in the actual applications always use this probability and allow for a 
free A that we will marginalize over. 

The equality of the variances of the two populations means that we are measuring 
the distance A —29 between the two mean values in units of the standard deviation. 
We also allow for different numbers of points drawn from the red and blue Gaussians 
through a 'rate parameter' p G [0, 1] that gives the probability to draw a red point. If 
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Fig. 2 Posterior probabilities: the top panel provides an illustration of the two toy distributions, 
in the case of 9 = 0.5, 1.0,2.0 (left to right). The bottom panel shows the probability histogram 
density plots, or number of red points with a given probability, where rfw'' ' (P) is given in Eq. jl4| 
for e = 0.5 (blue), 1 (red) and 2 (yellow). 



we draw points in total, we will then have on average pN red points and {I— p)N 
blue points. The likelihood for a set of points {xj}, with j running from 1 to A^, is 
then 

N 1 

Pi{xj}\9) ^11^ (Pe-^-^'^-'^^' + {I - P)e-'^^'^+''j^') (10) 

forP = p. 

To simplify the analysis we assume that we are dealing with large samples so 
that 9 is determined to high precision, with an error much smaller than a. In this 
case (and since this is a toy model) we can take the parameter 9 fixed. We also note 
that if we are running this in BEAMS with a true prior probability P = p then we 
would find a normalization parameter A = 1, while for P = 1/2 we would obtain 
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A = p / {1 — p), and we again assume that this parameter can be fixed to its true 
value. Then it is easy to see that if we leave the probability for point /, P,, free, we 
find a Bayes factor 



p{{xj}\e,Pi^o) ,-He+..)^ • ^ ^ 

In other words, ln(B) = xiA, just the value of the data point times the separation 
of the means. If the point is exactly in between the two distributions, x, — 0, then 
B = 1, i.e. its BEAMS posterior probability to be red or blue is equal. This means 
that if we want to think of the BEAMS posterior probability as the probability to 
be red or blue, we should update the Bayes factor with A, i.e. use B — BA, with an 
associated probability P = B/{1 +B). We also see that the probability to be red in- 
creases exponentially as x, increases. As we will see below, this reflects the fact that 
the number of red points relative to the blue points increases in the same way. The 
rapidity of this increase is governed by the separation. A, of the two distributions. 

What is the distribution of the posterior probabilities, i.e. the histogram of prob- 
ability values, and what determines how well BEAMS does as a typer in this ex- 
ample? The number of red points in an interval [x,x + dx] is just given by the 'red' 
probability distribution function at this value, times dx. To plot this function in terms 
of P we also need 

^^^^^In^^MP/g-P))^ (12) 
dP 

^AP{\-P). (13) 

dx 

The probability histograms for the red (r) and blue (b) points, normalized to p and 
1 — p respectively, then are: 



p dP 

exp ' 



^/2^AP{l~P) 2 




dN'^'-\P) 

dN^''\p)= , exp, 

^ ^ V2^AP{l-P) ^ [ 2 

We plot dN'^'^dP/p for = 0.5, 1 and 2 in the lower panel of Figure [2] We see 
how the values become more concentrated around P —\ for larger separation of the 
distributions, i.e. BEAMS becomes a "better" typer. But for very large separations 
there are also suddenly more supernovae at low P (yellow curve). The reason is that 
BEAMS does not try to be the best possible typer, instead it respects the condition 
that the probabihties have to be unbiased, in the sense that 



dN^''^ 

'dN^ ^ \\-P) \\-p 



= BA=B. (16) 
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Since BEAMS only uses the information coming from the distribution of the val- 
ues, its power, as reflected in the distribution of probability values dN{P), is given 
by how strongly the distributions are separated. If they are identical (0=0) then 
BEAMS can only return P = 1/2 while for larger 9 there is a stronger preference 
for one type over another But given the two populations, we can in principle derive 
the probability histogram by just looking at the ratio of data points of either type at 
each point in data space, there is nothing else BEAMS can do. Also, in order for the 
probabilities to be unbiased (up to the rates which are taken into account by A) if 
there are, say, 200 red points in the P = 0.9 bin and only 10 in the P = 0.8 bin, then 
we need to find about two blue points in the P = 0.8 bin, but 20 in the P = 0.9 bin. 
Although this looks like a significant misclassification problem, it is just a reflection 
of Eq. ( [T6| and is actually the desired behavior: BEAMS is not a classification algo- 
rithm (see e.g. 1 13 15 , 14| for efforts in that direction) but instead a way to compute 
the posterior pdf of the parameters 9. The property of unbiased probabilities is re- 
quired to get unbiased parameter constraints, and indeed for that purpose we never 
classify any data points. Instead we leave them in a superposition of different types, 
weighted by the associated probabiUties as encoded in the marginalization over T in 
Eq. Q. 



3 Application of BEAMS to supernova observations 

In this section we will complete first our discussion of the posterior (j7]i by providing 
explicit expressions for the two likelihood functions. We will say a few words on 
the numerical strategy used to explore the posterior parameter distribution and check 
the performance of the algorithm with a range of tests. This section and the next is 
based on the results obtained in ifTOl . 

Before entering the likelihood discussion, we would like to remind the reader 
that supernova data is given in the form of a distance modulus as a function of 
redshift z- In addition, the distance modulus depends on a set of cosmological and 
nuisance parameters 9. The cosmological parameters {//q , ■^2,,, , 12^ } are the true 
quantities of interest for us. Here Hq is the expansion rate of the universe today (the 
Hubble constant), and the Qj are the relative energy densities in matter m and a 
cosmological constant A. See for example the book by Scott Dodelson ifTBi for a 
good introduction to cosmology. 

The distance modulus is related to the cosmological model via; 

^l{z,9)^5logdLiz,9)+25, (17) 

where 

dLiz,9) = ^^11^^ sinh ( JOk r (18) 
' VihHo V Jo E{z)) 

is the luminosity distance measured in Megaparsec (Mpc), and the normalized ex- 
pansion rate is given by 
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E{z) = ^ = Ji2„,il +zy + Qk{i+z)^ + QA- (19) 

The relative energy densities of matter (i2„,), curvature (Qjc) and the cosmological 
constant (Qa) obey the relation f2„, + Qk + = 1^ which we use to express 
in terms of the other i2's. Notice that < is possible, in which case y/£2k in 



Eq. (18 1 becomes imaginary and the hyperbolic sine becomes a normal sine func- 
tion instead - the limit 12,^ = is also well defined. The distance modulus is defined 
as the difference between the absolute and apparent magnitudes of the supernova, 
H=m — M, with additional corrections made to the apparent magnitude for the cor- 
relations between brightness, color and stretch and a K-correction term related to 
the difference between the observer and rest-frame filters, for example. The correc- 
tions are typically made within the model employed in a light-curve fitter, such as 
that for MLCS2k2. 

In this application of BEAMS we have assumed that the distance modulus jj. is 
obtained directly from the light-curve fitter (such as is the case for fitters which 
use the MLCS2k2 light-curve model), however this is not an implicit assumption. 
In the case of the SALT light-curve fitter, the distance modulus would be recon- 
structed using a framework such as that outlined in ifTTl before including in the 
BEAMS algorithm. We will also always assume that the distance modulus has been 
obtained under the assumption that all supernovae are of type la. This means that it 
is straightforward to write down the likelihood for type-la supernovae, but that we 
need to do extra work for the non-la supernovae. It would of course be preferable 
to have distance moduli for all possible supernova types, but this is still an active 
research topic in astronomy. 



3.1 Choice of likelihood 

3.1.1 Likelihood of type-la supernovae 

Following the standard astronomical literature, the la likelihood is modeled as a 
Gaussian probability distribution function (pdf) for the observed distance modulus 
jXi centered around the theoretical value IJ.{z,9) with a variance a^g^ f. 

Pi,.\0,r. = 1) ^ -=^exp ( j^'- f^'))' ] . (20) 



'tot.i / 

Again following standard practice, we model the error on the distance modulus 
of each supernova as a sum in quadrature of several independent contributions, 

(^tot,, = c^M,' + ^T^ + ^M,z' (21) 
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where ff^ , is the error obtained from fits to the SN hght-curve, Or is the character- 
istic intrinsic dispersion of the supernova population, which we add as an additional 
global parameter to the vector 9 with Jeffreys' prior. The constraints do not depend 
strongly on the prior used for the intrinsic dispersion. The error term (7^^ converts 
the uncertainty in redshift due to measurement errors and peculiar velocities into an 
error in the distance of the supernova as: 

""^■^ = h^M ^(TT^ (22) 

with ffj. as redshift error, and Vp^c as the typical amplitude of the peculiar velocity of 
the supernova, which we take as 300 kms"' lfT8ll6l. 



3.1.2 Likelihood of all other supernovae 

The general form of the non-SNIa likelihood will be complicated since there are 
several sub-populations. Given the limited number of non-SNIa in the SDSS-II SN 
data set however, (see Figure |8]l we will model it with a single mean and a dis- 
persion. If one chooses to describe a population using only a mean and a variance, 
statistically the least-informative (maximum entropy) choice of pdf in this case is 
also a Gaussian lfT9l . 

Pi,,\0,r.-0) ^ ( j^^-niz.B)f \ . (23) 

As we do not know the mean 77 and variance s^^^ , of the non-la population, we 
describe them with additional parameters. We will keep the parametrization of the 
mean very general (see below) but for the variance we restrict ourselves to the same 



form as for the Type la supernovae, Eq. ( 2 1 1, but with a potentially different intrinsic 
dispersion described by an independent parameter (again with a Jeffreys' prior). 
We assume that the measurement errors and the contribution from the peculiar ve- 
locities enter in the same way for Type la and other supernovae and so keep these 
terms identical. 

We do not know what to expect for the mean of the non-la pdf and so we allow for 
a range of possibilities. As the brightness is linked to the luminosity distance through 
Eq. ( [T7| , we describe the expected non-la distance modulus (as provided by the 
light-curve fitter which assumes actually a type-la supernova) as a deviation from 
the theoretical value, T7(z,0) — I-L{z,9) + Y{z), where we consider the following 
Taylor expansions of the difference as a function of redshift: 

3 

r(z) = 77(z,0)-Ai(z,0)-L («'^'') / {l+dz). (24) 

1=0 
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We consider the cases where we set different combinations of the parameters {ai,d), 
to zero, and employ a criterion based on model probability to decide which of these 
functions to use. We note that the explicit link of rj{z,9) to jj. (z, 0) carries a risk that 
the non-la likelihood can influence the posterior estimation of the cosmological pa- 
rameters. For this reason we verify that the contours do not shift when we set directly 
ri{z,9) = Y{z), although we will need a higher-order expansion in general (and of 
course the recovered parameters of the function Y{z) will change). In general, as 
long as the basis assumed has enough freedom to fit the deviation in the distance 
modulus of the non-la population from the la model, the inferred cosmology will 
not be biased. 

For a cosmological analysis we just marginalize over the values of the parameters 
in r(z), but these parameters contain information on the distribution of non-la type 
SN and thus their posterior is of interest as well, allowing us to gain insight into the 
distribution characteristics on the non-la population at no additional 'cost'. 

The simple binomial case considered here, where the non-la population consists 
of all types of core-collapse SNe, is probably too simplistic to accurately describe 
the distribution of non-la supernovae. In general one could include multiple popu- 
lations, one for each supernova type, which would yield a sum of Gaussian terms 
in the full posterior. In addition, the forms describing the distance modulus of the 
non-la population are chosen to minimize the cosmological information from the 
non-la's (we always test for a deviation from the cosmological distance modulus), 
however, the parameterization of the non-la distance modulus could be improved by 
investigating the distance modulus residuals from simulations, as the major contri- 
butions to the distance modulus residuals appear to be the core-collapse luminosity 
functions, along with the specific survey selection criteria and limiting magnitude, 
see ||20| . While current SN samples do not include a large enough sample of non-la 
data to test for this, larger data sets (such as the data from the BOSS SN survey) will 
allow for a detailed analysis of the number (and form) of distributions describing 
the contaminant population. 



3.2 Numerical methodology 

In this work, the BEAMS algorithm is implemented within a Markov Chain Monte 
Carlo (MCMC) framework, and the Metropolis-Hastings li21J acceptance criterion 
was used. We use the cosmological parameters {Q,„,Qa,Ho} in the case of the 
approach on the spectra and cut samples described below, and add additional 
parameters {A,Ox,ST:,a} in the case of the BEAMS application. The parameters 
a = {a^ ,a^};d = = are for the quadratic model, in the other models for 
T{z) we adjust the parameters accordingly. The chains were in general run for 
around 100 000 steps per model; this was sufficient to ensure convergence. We 
test for convergence using the techniques described in ll22l . We impose positivity 
priors on the energy densities of matter and dark energy, and impose a flat prior 
on the Hubble parameter between 20 < Hq < 100 kms^'Mpc^'. The Hubble pa- 
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rameter is marginalized over given that we do not know the intrinsic brightness of 
the supernovae, but through the distance modulus are only sensitive to the relative 
brightness of the supernovae. We impose broad Gaussian priors on the parameters 
of the non-la likelihood function, and step logarithmically in the probability nor- 
malization parameter A, as well as the intrinsic dispersion parameters of both the la 
and non-la. distributions. 



3.3 Comparison to standard methods 

The primary difference between BEAMS and current methods is that the latter either 
require that all data are spectroscopically confirmed, or apply a range of quality cuts 
based on selection criteria. Here we will compare the performance of BEAMS to 
these two approaches, by processing the data that pass the required selection criteria 
using the la likelihood, Eq. (|20|. We will hereafter refer to this as the approach. 




We use the following sample^ 

• spectro sample: 

The sample containing only spectroscopically confirmed supernovae. In addi- 
tion to spectroscopic confirmation we will also apply a cut on the goodness- 
of-fit probability from the light-curve templates within the MLCS2k2 model, 
Pfit > 0.01, and a cut on the light-curve fitter parameter A > —0.4, where A is 
a parameter in the MLCS2k2 model describing the light-curve width-luminosity 
correlation. MLCS2k2 was trained using the range —0.4 < A < 1.7 f23l , hence 
we restrict the sample to A > —0.4, which is a cut typical in current SN surveys, 
and so we introduce the cut to provide comparison between datasets. We process 
this spectro sample using the approach. 

• cut sample: 

This larger sample is selected both by removing 5(T outliers from a moving av- 
erage fit to the Hubble diagram including both photometric and spectroscopi- 
cally confirmed data and applying a cut to the sample, including only data with 
a high enough probability, Ptyper > 0.9 (where the probability comes from a gen- 
eral supernova typing procedure, such as PSNID, described in ll24l |25l ). We 
choose to use the PSNID probabilities to make the probability cut on the sample 
(^typer > 0.9); if the MLCS2k2 probabilities had themselves been used to make a 
cut sample, then objects would only be included if they had probabilities greater 
than, for example. Put > 0.9 In addition, we impose a cut on the goodness-of-fit 
of the light-curve data to the Type la typer, xfc < 1-8, a cut on the goodness- 
of-fit probability from the light-curve templates within the MLCS2k2 model, 
Pfit > 0.01, and a cut on 4 > —0.4. In this cut sample case we then use standard 
the x^ cosmological fitting procedure on the sample, and so set the la probability 
of all points to one. 



We apologize for the use of technical jargon in the description of the samples. 
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• photo sample: 

This sample is the one to which BEAMS will be applied, and will include all 
the photometric data with host galaxy redshifts. As in the previous two cases, we 
include only data which have P^t > 0.01,4 > —0.4. 

Note that the spectra sample will be included in all the three samples described 
above. While the spectra and cut samples have by definition Pja — 1 (as they are 
analyzed in the approach), we do not set the probabilities to unity when applying 
BEAMS to the full sample - the spectra subsample within the larger phata sample 
will be treated 'blindly' by BEAMS. The spectra sample is the one most similar to 
current cosmological samples, and will be used to check for consistency in the de- 
rived parameters between BEAMS applied to the photo sample and the approach 
on the spectra sample. 



3.4 Tests on simulated data 

To test the BEAMS algorithm explicitly we need a completely controlled sample, 
where all variables (such as the non-la model and the SN-Ia probabilities) are di- 
rectly known and where we can verify that the algorithm is able to recover them 
correctly. In addition, we use this data set to check that we recover the correct 
shape of the non-la distance modulus r]{z) since the true 77(2, 0) is known for this 
sample only. We simulate a population of 50 000 SNe, with redshifts drawn from 
a Gaussian distribution, z ^ ./K (0.3, 0.15), and distance moduli drawn from a flat 
ACDM universe with (i2„,,i2yi,//o) — (0.3,0.7,70). The non-la population includes 
a contribution to the distance modulus, T7(z, 6) — 0) + a'z + a^z^, where 
we choose (fl*',fl',fl^) = (1.5,1,-3). We assign P/„ probabilities from a model 
dN/dPia = AiPja +A2P/„; with Al = -0.9; A2 = 1.9. We then assign the types 
from the two samples (of la's and non-la's), i.e. we choose a random number t and 
if / < Pia (i.e. the type also follows the same relationship as the probability) we take 
the data point to be a la, and if f > Pja we assign it as a non-la, until we run out of 
data points from either sample. This procedure reduces the sample size from 50 000 
to 37529, but guarantees unbiased probabilities. 

We assign a 'measurement error' to each distance modulus of On =0.1; add 



an intrinsic error Ox — 0.16 and a peculiar velocity error based on Eq. (21 1, with 
Vpec = 300kms^'. We then randomly scatter the data points based on the total er- 
rorbar To mimic what happens in a light-curve fitter, only the measurement error is 
recorded, however When performing parameter estimation on the points we either 
add this measurement error in quadrature to the other terms whose amplitudes are 
fixed (in the case of the approach), or we estimate the magnitudes of the intrin- 
sic dispersion when we apply the BEAMS algorithm. We randomly choose 10% 
of the la data and assign spectra status; this represents the data that are followed 
up by large telescopes on the ground. This spectra sample is drawn so that we can 
compare the BEAMS-estimated result to the x^ approach on a smaller sample. The 
data are shown in Figure |3] In the BEAMS analysis we checked on a small number 
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0.2 0.4 0.6 0.8 

Redshift z 

Fig. 3 Gaussian data: 37529 points simulated according to a Gaussian distributions around a 
distance modulus in a flat ACDM model for the la population (25000 points) and with extra terms 
up to quadratic order in redshift for the non-la population. The points are colored according to their 
simulated probabilities from blue (low probability) to dark brown (high probability). 



of simulated samples that the results obtained were unbiased - a full Monte Carlo 
simulation of bias is beyond the scope of this work. 

3.4.1 Performance on cosmological parameters 

We show in Fig.|4]the 2(7 confidence contours in the Q,n,^A plane when analyzing 
the Gaussian simulation with BEAMS (filled contours), the 'spectroscopic' sample 
(dashed contours) and the 'cut' sample (solid contours). We see that both BEAMS 
and the spectroscopic sample are consistent with the input cosmology (filled brown 
square at i2,„ = 0.3, Qa = 0.7), but BEAMS can use the additional information in 
the data and is able to provide much tighter constraints. The cut sample has also 
smaller contours than the spectroscopic sample (although larger than BEAMS) but 
is biased with respect to the input cosmology. 

As discussed in |8| for the one-dimensional case, the effective number of SNe 
that result when applying BEAMS scales as the number of spectroscopic SNe and 
the average probability of the dataset multiplied by the remainder of the photometric 
sample, (7 — ^ a/ y^A^spec + (-Pia)A^photo- In the two-dimensional case, the square root 
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Fig. 4 Analysis of Gaussian simulation: We show the 2<J contours in the Q„,,Q/^ plane for the 
simulated Gaussian data. The BEAMS constraints (filled contours with best fit indicated by the 
black open square) are consistent with the input cosmology (brown square), as is the 'spectro- 
scopic' sample (dashed contours, best-fit indicated by brown cross). The 'cut' sample on the other 
hand is biased by over 2(7 in spite of a relatively stringent cut on probability of Pcut = 0.9; stronger 
cuts will recover the true cosmology at the cost of sample size. 



would be removed as the area of the elhpse scales with the increase in the effective 
number of supemovae. In our applications we have, however, not used the fact that 
we know that some points are confirmed as Type la. In other words, the probability 
of each data point was taken from the light-curve fitter and was not adjusted to one 
or zero depending on the known type. Hence we expect the size of the contours in 
the ; — j plane to scale as 

pl/2 

We compute the size of the error ellipse for various Gaussian simulations as a func- 
tion of the size of the simulation, shown in Figure |5] for one particular model of 
the probabilities, and hence one value of {P). We impose a prior on the densities, 
and hence the ellipses are not closed for smaller samples. For large enough sam- 
ple sizes the ellipse is closed and we observe that the error ellipses scale in area as 
o= 1 /{Pia)N, which is consistent with earlier results |8|. In general then, one would 
obtain a different constant factor (P) in Figure|5]for different simulated probability 
distributions. 
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Fig. 5 Errors scale with ^ 
number of SNe: the size 
of the error ellipse, approxi- 
mated by the square root of 
the determinant of the two- 
dimensional chain of Q.„, , Qj^ 
shows the reduction in size 




with increasing the number of lo^ 
SNe in the simulation. Number of SNe 



3.4.2 Constraining T{z) forms for the non-la population 

The Gaussian simulation described in this section uses a quadratic model for the 
differences between the standard A CDM and the non-la distance modulus. We 
test here that assuming a different functional form while performing parameter es- 
timation does not significantly bias the inferred cosmology. We define the effective 
as — 21n^, where the posterior is given by Equation (j7]l, and we provide 
values relative to the simplest linear model for T{z). The goodness-of-fit of the dis- 
tributions to the data is summarized in Table [T] In Figure |6] we show that BEAMS 



Fig. 6 Different r(z) for 
the non-la likelihoods: la 

constraints in the Q.,„,Qji^ 
plane for different versions of 
the non-la distance modulus 
function, for the Gaussian 
simulation. We simulated 
a quadratic model, and ran 
BEAMS assuming a linear, 
quadratic, cubic and Fade 
form for T{z), as described 
in Section p.l.2| As expected, 
the linear model does not have 
enough freedom to capture the 
non-la distribution. 



0.7 



0.5 



Level 1 - Gaussian simulation 




0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 



is reasonably insensitive to the assumed form of the non-la likelihood, provided it 
is allowed enough freedom to capture the underlying model. A linear model fails to 
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recover the correct cosmology, as it does not have enough freedom to recover the 
difference between the la and non-la distribution. It correspondingly has a very high 
relative to the other approaches. The higher-order functions recover consistent 
cosmologies, and the of these models improves by A'x} < 0.5, even though the 
models have increased the number of parameters by one. 



Table 1 Comparison of non-la likelihood models for Gaussian simulation: x~ values for the 
fits using various forms of the non-la likehhood for the Gaussian simulations, where the true un- 
derlying model is quadratic. The constraints on Q,„,Q/^ are shown in FigurepI Ax^ff is difference 
in the effective between a given model and the linear case, which has x^ff = 42526.2. 



Model 




Parameters 


Y{z) = az + c 





2 


r(z) = az + bz^ + c 


-192.9 


3 


r{z)=az + bz^ + cz^ +d 


-193.3 


4 


r{z) = {az + bz^ + c)/{\+dz) 


-193.4 


4 



3.4.3 Dependence on Probability 

The BEAMS algorithm naturally uses some indication of the probability of a data 
point to belong to the la population, whether it is some measure of the goodness- 
of-fit of the data to a type la light-curve template, or something more robust such 
as the relative probability that the point is a la compared to the probability of being 
of a different type. By including a normalization factor, we can correct for general 
biases in the probabilities of the la points. One might still question, however, how 
sensitive BEAMS is to the input probability of the objects. For the Gaussian simula- 
tion, where we assign the probabilities, Pi^, we can directly change the relationship 
between the true underlying distribution of the types (i.e. the ratio of las to non- 
las in the sample) and the input probability value (the number we input into the 
BEAMS algorithm as the Pia). If the probabilities are unbiased then the distribution 
of types should follow the probability distribution of the data, in other words 60% of 
the points with Pia = 0.6 should be Type la SNe. This is the standard case. We then 
modify the probabilities by assigning a probability of P^^ — 0.3 to all points (which 
we know will be biased since the mean probability of the sample is 0.667). 

We compare the constraints in the two cases in Figure |7] If we ignore all prob- 
ability information and set it to a (biased) value of Pi^ = 0.3, the probability infor- 
mation is essentially controlled by the normalization parameter A tends to a value 
of 4.7, which, when inserted into Equation (|9]l yields a 'normalized' probability of 
Pin = 0.668. Hence BEAMS uses the normalization parameter to remap the mean 
of the given probabilities to ones that have a mean that fits the true unbiased prob- 
abilities. In correcting for this effect, BEAMS manages to recover cosmological 
parameters consistent with the unbiased case. 
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Fig. 7 BEAMS corrects for biased input probability: the marginalized one-dimensional like- 
lihood for the normalization parameter A (top panel) and estimated contours (bottom panel) for 
Level I Gaussian simulation under two forms of the probability distribution. The pink curve and 
contours correspond to the nominal case, where the probabilities are generated in a linear model, 
and the types are assigned according to the probabilities. The purple dashed contours correspond 
to assigning a probability of f\a = 0.3 to all points. The dashed vertical lines show the expected 
value of the parameter A such that the true input mean probability of Pi^ = 0.667 is recovered. Note 
that the x-axis in the top panel has been shortened to allow for comparison of the two distributions. 



4 Results from the SDSS-II SN data 



The Sloan Digital Sky Survey Supernova Search operated for three, three-month 
long seasons during 2005 to 2007. We use the photometric supernovae from all three 
seasons of the SDSS-II SN survey which also had host galaxy redshifts from the 
SDSS survey. The analysis and cosmological interpretation of the first season of data 
(hereafter Fall 2005) are described in |l26l|6l[l8l and EJ). The SDSS CCD camera 
is located on a 2.5 m telescope at the Apache Point Observatory in New Mexico. 
The camera operated in the five Sloan optical bands ugriz [28 1. The telescope made 
repeated drift scans of Stripe 82, a roughly 300 square degree region centered on the 
celestial equator in the Southern Galactic hemisphere, with a cadence of roughly 
four to five days, accounting for problems with weather and instrumentation. 

The images were scanned and objects were flagged as candidate supernovae ll24ll . 
Candidate light-curves were compared to a set of supernova light-curve templates in 
the g,r,i bands (consisting of both core-collapse and Type la supernovae) as a func- 
tion of redshift, intrinsic luminosity and extinction. Likely SNIa candidates were 
preferentially followed up with spectroscopic observations of both the candidates 
and their host galaxies (where possible) on various larger telescopes (see 1,241). 

In addition to the spectroscopically confirmed SNela discovered in the SDSS- 
II SN, many high-quality candidates without spectroscopic confirmation (i.e. only 
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Fig. 8 SDSS-II SN data: the photometric sample of the full three seasons of SDSS-11 SN survey. 
The 792 points are all those with host galaxy spectroscopic redshifts. The sample includes 297 
spectroscopically confirmed SNe, and the data points are color coded using probabilities from the 
PSNID typer OlllS] from low (blue) to high (dark brown). 



photometric observations were made of the SNe) but which, by chance, have a host 
galaxy spectroscopic redshift, are present in the SDSS sampla^ 

We include these SNe in both the cut sample and the full photo sample, but do 
not set the probabilities of the spectroscopically confirmed spectro sample points to 
unity in the latter These supernovae are fit with the MLCS2k2 model ll23l to obtain 
a distance modulus for each supernova, assuming the supernova is a type la. 

As outlined in Section 3.3 we impose the standard selection cuts on the proba- 
bility of the fit to the MLCS2k2 light-curve template ffit > 0.01 and A > -0.4 to all 
data, and require that the data used have spectroscopic host galaxy redshift informa- 
tion. Applying these cuts to the full three year data yields a photometric sample of 
792 SNe, with a spectroscopic subsample of 297 SNe. The spectro sample consists 
of the objects which have been spectroscopically confirmed by other ground-based 
telescopes, while the cut sample consists of the data points which have a typer prob- 



The BOSS survey recently obtained host galaxy redshifts of all high-quality SN candidates from 
all three seasons of the SDSS-II Supernova Search. This work does not use the additional BOSS 
information and only uses the host galaxy redshifts obtained during the running of the SDSS-II 
survey. 
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ability of Ptyper > 0.9 and a goodness-of-fit to the light-curve templates within the 
PSNID typer ||24l|23, xl < 1-8. 
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Level III - SDSS-II data 
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Fig. 9 Analysis of the SDSS-II SN data: We show the 2a contours in the Q„,Qa plane for the 
SDSS-II SN data. All the constraints - BEAMS (filled contours with best fit indicated by black 
open square), the 'spectroscopic' sample (dashed contours, best-fit indicated by brown cross) and 
the 'cut' sample - are consistent with the concordance cosmology (filled brown square). The best- 
fit BEAMS point is given by the black square, while the best-fit cosmology from the spectroscopic 
data is indicated by the brown cross. BEAMS provides the smallest contours on the SDSS-II data 
set, while still being consistent with the constraints from the spectroscopic subsample. 



As is shown in Figure|9] BEAMS estimates parameters consistent with the spec- 
tra sample as well as the concordance cosmology in the case of the SDSS-II SN 
data. Moreover, the BEAMS contours are three times smaller than when using the 
spectra sample alone. In the Gaussian simulations (see Fig.|4]l, the BEAMS contours 
using all the points are ~ 16% of the size of the spectra sample. This highlights the 
potential of photometric supernova cosmology to drastically reduce the size of er- 
ror contours with larger samples while remaining unbiased relative to the 'known' 
spectroscopic case. 



5 Conclusions and outlook 



Bayesian Estimation Applied to Multiple Species (BEAMS) is a statistically robust 
method for parameter estimation in the presence of contamination. The key power 



BEAMS 



21 



of BEAMS is in the fact that it makes use of all available data, hence reducing the 
statistical error of the measurement, whether or not the purity of the sample can 
be guaranteed. Rather than discarding data, the probability that the data are "pure" 
is used as a weight in the full Bayesian posterior, reducing potential bias from the 
interloper distribution. 

Here we have presented the algorithm, and discussed in some detail the role 
of the probabilities. We have tested BEAMS on an ideal Gaussian simulation of 
two populations and demonstrated that it recovers the input parameters. We have 
also shown that the BEAMS errors scale as expected with sample size, and that it 
provides smaller errors than some of the traditional approaches. Using the Gaussian 
simulation we have further verified that we can detect the correct form of the non-la 
likelihood and correct for a bias in the probabilities. 

We have then applied BEAMS to the SDSS-II SN data set of 792 SNe, using 
photometric data points with host galaxy spectroscopic redshifts, and showed that 
the BEAMS contours are three times smaller than those obtained when using only 
the spectroscopically confirmed sample of 297 SNe la. 

We have restricted ourselves to the binomial case of a SN-Ia population and one 
general core-collapse, or non-la, population. While this assumption is valid for the 
SDSS-II SN data, we expect that for larger samples a more complicated model with 
at least two separate non-la Gaussians is more appropriate. On the other hand, large 
supernova surveys will not only increase the total number of type la SNe candidates, 
but will also allow to investigate systematics about the SNe populations directly. The 
BEAMS algorithm is designed to include and adapt to information about the non- 
la population easily. By adapting the form of the non-la population, and including 
more than one population group, one could use BEAMS to gain insight into the 
contaminant distribution. 

As we move into the era of huge astronomical surveys that will provide data on 
thousands of supernovae, BEAMS provides a platform to learn more about the SN 
populations while at the same time tackling the fundamental questions about the 
constituents of the universe. 
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